Audiovisual Vocal Outburst Recognition in Noisy Acoustic Conditions
نویسندگان
چکیده
In this study, we investigate an audiovisual approach for classification of vocal outbursts (non-linguistic vocalisations) in noisy conditions using Long Short-Term Memory (LSTM) Recurrent Neural Networks and Support Vector Machines. Fusion of geometric shape features and acoustic low-level descriptors is performed on the feature level. Three different types of acoustic noise are considered: babble, office and street noise. Experiments are conducted on every noise type to asses the benefit of the fusion in each case. As database for evaluations serves the INTERSPEECH 2010 Paralinguistic Challenge’s Audiovisual Interest Corpus of human-to-human natural conversation. The results show that even when training is performed on noise corrupted audio which matches the test conditions the addition of visual features is still beneficial.
منابع مشابه
Speech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions
Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...
متن کاملNoise alters beta-band activity in superior temporal cortex during audiovisual speech processing
Speech recognition is improved when complementary visual information is available, especially under noisy acoustic conditions. Functional neuroimaging studies have suggested that the superior temporal sulcus (STS) plays an important role for this improvement. The spectrotemporal dynamics underlying audiovisual speech processing in the STS, and how these dynamics are affected by auditory noise, ...
متن کاملImpact of Vocal Tract Length Normalization on the Speech Recognition Performance of an English Vowel Phoneme Recognizer for the Recognition of Children Voices
Differences in human vocal tract lengths can cause inter speaker acoustic variability in speech signals spoken by different speakers for the same textual version and due to these variations, the robustness of a speaker independent (SI) speech recognition system is affected. Speaker normalization using vocal tract length normalization (VTLN) is an effective approach to reduce the affect of these...
متن کاملPattern recognition mediates flexible timing of vocalizations in nonhuman primates: experiments with cottontop tamarins
To maximize transmission in noisy environments, vocalizing animals have evolved capacities to avoid the masking effects of biotic and abiotic sound sources, such as changing the structure and timing of acoustic signals. Here we explore this problem from a new angle, asking whether animals can extract predictive acoustic cues from an intermittently noisy environment and use this information to g...
متن کاملEvaluating robust features on deep neural networks for speech recognition in noisy and channel mismatched conditions
Deep Neural Network (DNN) based acoustic models have shown significant improvement over their Gaussian Mixture Model (GMM) counterparts in the last few years. While several studies exist that evaluate the performance of GMM systems under noisy and channel degraded conditions, noise robustness studies on DNN systems have been far fewer. In this work we present a study exploring both conventional...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011